Skip to content

feat(sdk): add otel support#177

Merged
namrataghadi-galileo merged 30 commits intomainfrom
feature/61578-add-otel-support
May 8, 2026
Merged

feat(sdk): add otel support#177
namrataghadi-galileo merged 30 commits intomainfrom
feature/61578-add-otel-support

Conversation

@namrataghadi-galileo
Copy link
Copy Markdown
Contributor

Summary

Added configurable observability sink selection for the Python SDK and server so control events can be routed to the default backend, registered custom sinks, or named sink factories.

Added a built-in OpenTelemetry sink for the SDK, including OTLP configuration via settings/environment variables.

Updated docs and exports so custom sink registration and OTEL usage are available as public integration points.

Scope

User-facing/API changes:

  • Added observability_sink_name and observability_sink_config to SDK initialization/configuration
  • Exposed sink registration helpers and OTEL conversion utilities from the Python SDK public API
  • Added documented support for the SDK otel extra and OTEL-related environment variables
  • Added server observability sink selection/config support for named backends

Internal changes:

  • Introduced shared sink-selection models/registry helpers in telemetry/
  • Refactored SDK observability to resolve active sinks dynamically and support named/custom sinks
  • Refactored server startup/shutdown to resolve observability backends through sink selection
  • Added test coverage for sink selection, lifecycle handling, OTEL sink behavior, and re-init/policy refresh interactions

Out of scope:

  • No changes to core control evaluation semantics
  • No UI changes
  • No new non-OTEL external sink implementation included beyond registration hooks

Risk and Rollout

Risk level: medium

Rollback plan: Revert the sink-selection/OTEL changeset to restore the previous default SDK-to-server observability path only. If needed, disable custom routing by keeping observability_sink_name=default and not installing/configuring the otel extra.

Testing

  • Added or updated automated tests
  • Ran make check — typecheck was attempted but full local make check was not run because uv was unavailable in this environment
  • Manually verified behavior

Checklist

  • Linked issue/spec (if applicable)
  • Updated docs/examples for user-facing changes
  • Included any required follow-up tasks

@namrataghadi-galileo namrataghadi-galileo changed the base branch from main to feature/61576-add-config-driven-sink-selection April 16, 2026 22:36
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 16, 2026

Copy link
Copy Markdown
Contributor

@lan17 lan17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like having a built-in OTEL path, but I don't think this version is safe to merge yet. Right now the SDK can tell callers observability is working while the OTEL sink is effectively inert, and the caching logic doesn't compose cleanly with the new otel_* settings. I'd also mark failed control executions as OTEL errors rather than only attaching an exception object.

Comment thread sdks/python/src/agent_control/otel_sink.py
Comment thread sdks/python/src/agent_control/otel_sink.py
Comment thread sdks/python/src/agent_control/otel_sink.py
@namrataghadi-galileo
Copy link
Copy Markdown
Contributor Author

@lan17 Thanks, this was a good catch. I fixed the three unsafe parts you called out:

The built-in otel sink no longer counts as active when it can’t actually export. If OTEL is disabled, deps are missing, or there’s no exporter configuration, we now treat that sink as inert for observability selection instead of reporting observability as working.
The named-sink cache now keys off the effective resolved OTEL config, so changes to otel_enabled / otel_endpoint / otel_headers / otel_service_name rebuild the cached OTEL sink instead of reusing a stale one.
Failed control executions now mark the OTEL span status as ERROR in addition to recording the exception.
I also added regression tests for the inert-sink case, the OTEL settings cache invalidation case, and the error-status span behavior. Targeted SDK observability/OTEL tests are passing locally.

Base automatically changed from feature/61576-add-config-driven-sink-selection to main April 22, 2026 20:41
Copy link
Copy Markdown
Contributor

@lan17 lan17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on tightening up the earlier OTEL issues. I found one remaining opt-out case that I think should be fixed before merge: an explicit observability_enabled=False can still leave the selected OTEL sink active for later writes.

Comment thread sdks/python/src/agent_control/observability.py
@namrataghadi-galileo namrataghadi-galileo requested a review from lan17 May 7, 2026 23:52
Copy link
Copy Markdown
Contributor

@lan17 lan17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working through the review feedback, Namrata. I did a fresh pass from the latest head, including the opt-out path and the OTEL sink/cache behavior, and this looks good to me now.

@namrataghadi-galileo namrataghadi-galileo enabled auto-merge (squash) May 8, 2026 15:02
@namrataghadi-galileo namrataghadi-galileo merged commit 9530368 into main May 8, 2026
6 checks passed
@namrataghadi-galileo namrataghadi-galileo deleted the feature/61578-add-otel-support branch May 8, 2026 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants